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Weak Ties: Subtle Role in the Information Diffusion in Online Social Networks 
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As a social media, online social networks play a vital role in the social information diffusion. 
However, due to its unique complexity, the mechanism of the diffusion in online social networks is 
different from the ones in other types of networks and remains unclear to us. Meanwhile, few works 
have been done to reveal the coupled dynamics of both the structure and the diffusion of online 
social networks. To this end, in this paper, we propose a model to investigate how the structure is 
coupled with the diffusion in online social networks from the view of weak ties. Through numerical 
experiments on large-scale online social networks, we find that in contrast to some previous research 
results, selecting weak ties preferentially to republish cannot make the information diffuse quickly, 
while random selection can achieve this goal. However, when we remove the weak ties gradually, the 
coverage of the information will drop sharply even in the case of random selection. We also give a 
reasonable explanation for this by extra analysis and experiments. Finally, we conclude that weak 
ties play a subtle role in the information diffusion in online social networks. On one hand, they act 
as bridges to connect isolated local communities together and break through the local trapping of 
the information. On the other hand, selecting them as preferential paths to republish cannot help 
the information spread further in the network. As a result, weak ties might be of use in the control 
of the virus spread and the private information diffusion in real- world applications. 

PACS numbers: 89.65.-s, 87.23. Ge, 89.70.-a, 89.75.-k 



I. INTRODUCTION 

The emergence of the Internet has changed the way 
of communication radically and, especially, the devel- 
opment of Web 2.0 applications has led to some ex- 
tremely popular online social sites, such as Facebook 1|, 
Flickr a, YouTube ^, Twitter [ij, Live Journal [g, 
Orkut [y| and Xiaonei [3| . These sites provide a powerful 
means of sharing information, finding content and orga- 
nizing contacts ^] for ordinary people. Users can consoli- 
date their existing relationships in the real world through 
publishing blogs, photos, messages and even states. They 
also have a chance to communicate with strangers that 
they have never met on the other end of the world. Based 
on the development and prevalence of the Internet, online 
social sites have reformed the structure of the traditional 
social network to a new complex system, called the online 
social network, which attracts a lot of research interests 
recently as a new social media. 

Recent works about online social networks mainly fo- 
cus on probing and collecting network topologi es m, Q , 
structural analysis [8l-[Tl|. user interactions [l2| - [T3 | and 



content generating patterns [T^[TB|- At the same time, 
some concepts and methods of traditional social networks 
have also been introduced into current researches: The 
strength of ties is one of them. The strength of ties was 
first proposed by Granovetter in his landmark paper [l^ 
in 1973, in which he thought the strength of ties could 
be measured by the relative overlap of the neighborhood 
of two nodes in the network. It was interesting that dif- 
ferent from the common sense, he found that loose ac- 
quaintances, known as weak ties, were helpful in finding 
a new job jl7| . This novel finding has become a hot topic 
of research for decades. In ^ISf], a predictive model was 
proposed to map social media data to the tie strength. In 
|19| , Onnela et al. gave a simple but quantified definition 
to the overlap of neighbors of nodes i and j as follows: 



(1) 



ki 1 ^ kj 1 
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where Cy is the number of common acquaintances, ki and 
kj are the degrees of i and j, respectively. In this paper, 
we define Wij as the strength of the tie between i 
and j. The lower Wij is, the weaker the strength of tie 
between i and j is. 

As a social media, the core feature of online social 
networks is the information diffusion. However, the 
mechanism of the diffusion is different from traditional 
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models, such as Susceptible-Infected-Susceptible (SIS), 
Susceptible-Infected-Recovered (SIR) [13, and ran- 
dom walk p2l - [23 | . At the same time, few works have been 
done to reveal the coupled dynamics of both the struc- 
ture and the diffusion of online social networks [1^ [26| . 
To meet this critical challenge, in this paper, we aim to 
investigate the role of weak ties in the information diffu- 
sion in online social networks. 
By monitoring the dynamics of 



S 
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s<s„ 
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(2) 



where n is the number of connected clusters with S nodes, 
and N is the size of the network, a phase transition was 
found in the mobile communication network during the 
removal of weak ties first [l^. We find that this phase 
transition is pervasive in online social networks, which 
implies that weak ties play a special role in the struc- 
ture of the network. This interesting finding inspires us 
to investigate the role of weak ties in the information 
diffusion. To this end, we propose a model ID{a,/3) to 
characterize the mechanism of the information diffusion 
in online social networks and associate the strength of 
ties with the process of spread. Through the simula- 
tions on large-scale real-world data sets, we find that se- 
lecting weak ties preferentially to republish cannot make 
the information diffuse quickly, while the random selec- 
tion can. Nevertheless, further analysis and experiments 
show that the coverage of the information will drop sub- 
stantially during the removal of weak ties even for the 
random diffusion case. So we conclude that weak ties 
play a subtle role in the information diffusion in online 
social networks. We also discuss their potential use for 
the information diffusion control practices. 

The rest of this paper is organized as follows. Sec- 
tion [n] introduces the data sets used in this paper. In 
Section lllli we study the structural role of weak ties. 
The model ID{a,(3) is proposed in Section HVl and the 
role of weak ties in the information diffusion is then in- 
vestigated. Section |V] discusses the possible uses of weak 
ties in the control of the virus spread and the private in- 
formation diffusion. Finally, we give a brief summary in 
Section IVll 



II. DATA SETS 



TABLE I: Data Sets 


Data set 


1^1 


\E\ 


YouTube 


1134890 


2987624 


Facebook 


63392 


816886 



a list of all the user-to-user links crawled from the New 
Orleans regional network in Facebook during December 
29th, 2008 and January 3rd, 2009 In both two data 
sets, we treat the links as undirected. 

In these data sets, each node represents a user, while 
a tie between two nodes means there is a friendship be- 
tween two users. In general, creating a friendship be- 
tween two users always needs mutual permission. So 
we can formalize each data set as an undirected graph 
G{V,E), where V is the set of nodes and E is the 
set of ties. We use \V\ to denote the size of the 
network, and \E\ to denote the size of ties. Some 
characteristics of the data sets are shown in Table ID 
The Cumulative Distribution Function{C D F) of the 
strength of ties is shown in Fig. [TJ 
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FIG. 1: (Color online) CDF of the strength of ties. 

As we know, online social networks are divided into 
two types: knowledge-sharing oriented and networking 
oriented For the data sets we use, YouTube belongs 
to the former, while Facebook belongs to the latter, both 
of which are scale-free networks. 



We use two data sets in this paper, i.e., YouTube and 
Facebook in New Orleans. YouTube is a famous video 
sharing site, and Facebook is the most popular online 
social site which allows users to create friendships with 
other users, publish blogs, upload photos, send messages, 
and update their current states on their profile pages. 
All these sites have some privacy control schemes which 
control the access to the shared contents. The data 
set of YouTube includes user-to-user links crawled from 
YouTube in 2007 [8]. The data set of Facebook contains 



III. STRUCTURAL ROLE OF WEAK TIES 

In this section, we study the structural role of weak 
ties. As shown in Fig. [2a| and Fig. [2cl we find a phase 
transition (characterized by S) similar to the one in [l^ 
in online social networks during the removal of weak ties 
first. This phase transition, however, disappears if we re- 
move the strong ties first. Furthermore, it is also found 
in Fig. [2b] and Fig. [2d] that the relative size of giant con- 
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FIG. 2: (Color online) The variations of S and face during the removal of weak ties first and strong ties first, 

respectively, fr is the fraction of removed ties. 



nected cluster (GCC), denoted by face, shows differ- 
ent dynamics between the removals of weak ties first and 
strong ties first. We denote the critical fractions of the re- 
moved ties at the phase transition point by fc- It is inter- 
esting to note that fc = 0.753 for YouTube and fc — 0.890 
for Facebook when S reaches the submit, which are very 
close to the case when face ~ 0. 

In the percolation theory, the existence of the above 
phase transition means that the network is collapsed, 
while the network is just shrinking if there is no phase 
transition when removing the ties [19| . So the above ex- 
periments tell us that weak ties play a special role in 
the structure of online social networks, which is different 
from the one strong ties play. In fact, they act as the 
important bridges that connect isolated communities. In 
what follows, we build a model that associates the weak 
ties with the information diffusion, to discuss the coupled 
dynamics of the structure and the information diffusion. 



IV. DIFFUSING ROLE OF WEAK TIES 

The information diffusing in online social networks in- 
cludes blogs, photos, messages, comments, multimedia 
files, states, etc. Because of the privacy control and other 
features of online social sites, the mechanism of the in- 



formation diffusion in online social networks is different 
from traditional models, such as SIS, SIR and random 
walk. We start by discussing the procedure of informa- 
tion diffusion in online social networks. 



A. The Procedure of Information Diffusion 

The procedure of the diffusion in online social networks 
can be briefiy described as follows: 

• The user i publishes the information /, which may 
be a photo, a blog, etc. 

• Friends of i will know / when they access the profile 
page of i or get some direct notifications from the 
online social site. We call this scheme as push. 

• Some friends of i, may be one, many or none, will 
comment, cite or reprint /, because they think that 
it is interesting, funny or important. We call this 
behavior as republish. 

• The above steps will be repeated with i replaced by 
each of those who have republished /. 

It is easy to find that the key feature of the informa- 
tion diffusion in online social networks is that the infor- 
mation is pushed actively by the site and only part of 
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friends will republish it. Take Facebook as an example, 
in which News Feed and Live Feed are two significant 
and popular features. News Feed constantly updates a 
user's profile page to list all his or her friends' news in 
Facebook. The news includes conversations taking place 
between the walls of the user's friends, changes of pro- 
file pages, events, and so on ^2l\. Live Feed facilitates 
the users to access the details of the contents updated by 
News Feed. It is updated in a real-time manner after the 
user's login to the web [1^. In fact, News Feed aggre- 
gates the most interesting contents that a user's friends 
are posting, while Live Feed shows to the user all the 
actions his or her friends are taking in Facebook j29| . 

The feature of pushing and republishing we have dis- 
cussed above is indeed more obvious in Twitter, in which 
all the words you post will be pushed immediately to 
your followers' terminals, including a PC or even a mo- 
bile phone, and then they can republish it if they like. 
However, in real-world situations, the trace of the infor- 
mation is hard to collect [HI, especially for large-scale 
networks. So it is quite reasonable to build a model to 
characterize the mechanism and simulate the diffusion. 



If j is not in P, then add it to the set of nodes that 
will republish / in the next round, denoted by W. 
So W = W Li {j}. Repeat this step for Ri times. 

• Step 6: For each node in W, execute from Step 3 to 
Step 5 recursively until W is null or all the nodes 
in V have known /. 

It is easy to find from Eq. ([3]) that during the diffu- 
sion, the number of republishing nodes selected from the 
neighborhood of i is decided by ki and j3. It is consistent 
with the real situation that the user with more friends 
tends to attract more other users to visit and republish 
the information. The more interesting or important the 
information is, the higher the chance that it will be re- 
published. We use parameter a in Eq. (U) to associate the 
diffusion with the strength of the ties, which means dif- 
ferent values of a will lead to different selections of ties as 
paths for republishing information in the next round. In 
fact, when a = — 1, weak ties are to be selected preferen- 
tially as paths for republishing. The selection is random 
when a = 0, and the strong ties will be selected with 
higher priority when a — 1. 



B. The Model for Information Diffusion 



C. Results and Analysis 



Based on the procedure described above, we propose a 
simple model ID(a,l3), where a is the navigating factor 
and /3 represents the strength of the information. In this 
model, a determines how to select neighbors to republish 
the information, while /3 G [0, 1] is a physical character of 
the information, which describes how interesting, novel, 
important, funny or resounding it is. The model is de- 
fined as follows: 

• Step 1: Suppose there comes information I. Set 
the state of all the nodes in V to (Jq. The state (Tq 
of a node means I is not known to it, otherwise the 
state is ui. 

• Step 2: Randomly select a seed node i from the 
network. The degree of i is ki. Set i to cti. It 
publishes the information / with strength equal to 
/3 at time T = 0. 

• Step 3: Increase the time by one unit, i.e., T = 
T + 1. Set each node in the neighborhood of i to 
CTi. Add i to the set of nodes that have published 
/, denoted by P. So P = P U {i}. 

• Step 4: Calculate the number of nodes that will 
republish / in the next round: 



(3) 



Step 5: Select one node j from the neighborhood 
of i with the probability |30[ 



y 



(4) 



We define the fraction of nodes with the state (Ti as the 
coverage of /, denoted by C. Since it is found that only 
1-2% friends will republish the information in Flickr [25| , 
we let /3 = 0.01 in the simulations. Fig. [3] shows the 
numeric experimental results on Facebook and YouTube 
networks. As can be seen, C reaches the maximum when 
a = 0. In other words, compared with weak or strong 
ties, selecting the republishing nodes randomly from the 
neighborhood will make the information spread faster 
and wider. This is indeed out of our expectation, since 
previous studies show that weak ties can facilitate the 
information diffusion in social networks. 

To understand this, we further explore the process of 
the information diffusion in details. By Eq. ([T}, we can 
easily have 



= {k, - 2)/cij + kj/ci 



1. 



Assume that as kj increases, Cij increases proportion- 
ately, i.e., kj/cij — const. Then given a node i and its 
neighbor node j, we have kj Cij t=^ l/w'y f, 
and vice versa. This implies that a neighbor node of i 
tends to have a higher degree if it has a stronger strength 
of ties with i. Therefore, when selecting the republishing 
nodes for the next round from the neighborhood, different 
a will select nodes with different degrees preferentially. 
For example, when a — ~1, the weak ties will be se- 
lected with higher priority, which means that the nodes 
with lower degrees will be selected preferentially. How- 
ever, it is easy to learn from Eq. ^ that, for the node 
with lower degree, the republishing nodes selected from 
its neighborhood will be less, which will eventually reduce 
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FIG. 3: (Color online) The dynamics of C during the process of the diffusion. We perform the experiments for each 
pair of a and /3 20 times and return the mean value as the final result. 



the total number of republishing nodes and impede the 
information from further spreading in the network. As to 
the case of selecting strong ties preferentially, although 
it will tend to select the nodes with higher degrees to 
republish, the local trapping [l^ will limit the scope of 
selected nodes into some local areas and make it harder 
to propagate the information further in the network. 

To validate the analysis above, we also observe the frac- 
tion of the nodes that have published / during the diffu- 
sion, denoted by fpub- As shown in Fig. |31 fpub increases 
more slowly when a = — 1 , and the time- varying proper- 
ties of fpub are similar to those of C in Fig.[3]for different 
a values, respectively. We also monitor the fraction of 
the nodes that have published / in each hop away from 
the source node, denoted by fiocai- As shown in Fig. [SI 
when a = — 1, fiocai decreases faster than other cases, in 
particular the a ~ case. It means when a = —1, the 
number of republishing nodes selected from the neigh- 
borhood decreases sharply as the information spreading 
far away from the source, which agrees with our former 
analysis. As for the case of a = 1, fpub increases more 
and more slowly during the diffusion, because the nodes 
selected to republish are trapped in some local clusters. 
In other words, it is hard to find some new nodes to re- 
publish the information to the outer space. 

Based on the above results, we can conclude that se- 
lecting weak ties preferentially as the path to republish 
information cannot make it diffuse faster. However, this 
does not mean that weak ties play a trivial role in the 
information diffusion in online social networks, especially 
when we recall its special role in the network structure 
in Section Hill Let a = in ID{a,/3), we compare the 
variation of C under the situation of removing weak ties 
first with that of removing strong ties first. As shown in 
Fig. ini for the case of removing weak ties first, the cover- 
age of the information decreases rapidly, e.g., from 0.8 to 
0.4 in Facebook when the fraction of removed weak ties 
reaches about 0.4. This implies that weak ties are indeed 
crucial for the coverage of information diffusion in online 
social networks. 



To further study the effect of f3, we conduct experi- 
ments with different /3 values, as shown in Fig. [T] As can 
be seen, no matter what the /? value is, random selection 
(a = 0) is still the fastest mode for the information dif- 
fusion, although the gap tends to shrink with higher /3 
values. It is also shown that when /3 grows, C will also 
rise for all a values. That is, the greater the strength of 
the information is, the more nodes will be attracted to 
republish it, and the wider it will spread in the network. 

Until now we can conclude that weak ties play a subtle 
role in the information diffusion in online social networks. 
On one hand, they are bridges that connect isolated com- 
munities and break through the trapping of information 
in local areas p^. On the other hand, selecting weak ties 
preferentially as the path of republishing cannot make the 
information diffuse faster and wider. 



V. DIFFUSION CONTROL 

The growing popularity of the online social networks 
does not mean that it is safe and reliable. On the con- 
trary, the virus spread and the private information diffu- 
sion have made it become a massive headache for IT ad- 
ministrators and users [3l|, [s^l ■ For example, "KooFace" 
is a Trojan Worm on Facebook, which spreads by leaving 
a comment on profile pages of the victim's friends to trap 
a click on the malicious link [SS^. About 63% of system 
administrators worry that their employees will share too 
much private information online [34| . So as time goes 
by, it becomes more and more important and urgent to 
control the virus spread and the private information dif- 
fusion in online social networks. 

In the light of this, we can make use of the weak ties 
for the information diffusion control. That is, in the real- 
world practices, we can assume that the behavior of re- 
publishing information is random, i.e., a = 0. Then ac- 
cording to the results in Fig. [51 we can make the virus or 
the private information trapped in local communities by 
removing weak ties and stop them from diffusing further 
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FIG. 4: (Color online) The dynamics of fpub during the process of the diffusion. We perform the experiments for 
each pair of a and /3 20 times and return the mean value as the final result. 
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FIG. 5: (Color online) The dynamics of fiocai during the information propagation far away from the source. We 
perform each experiment 20 times and get the mean value as the final result. 



in the network. 



VI. SUMMARY 

Online social sites have become one of the most popu- 
lar Web 2.0 applications in the Internet. As a new social 
media, the core feature of online social networks is the in- 
formation diffusion. We investigate the coupled dynamics 
of the structure and the information diffusion in the view 



of weak tics. Different from the recent work [25|, we do 
not focus on the trace collection and analysis of the real 
data flowing in the network. Instead, inspired by [l^ . 
we propose a model for online social networks and take 
a closer look at the role of weak ties in the diffusion. 

We find that the phase transition found in the mo- 
bile communication network exists pervasively in online 
social networks, which means that the weak ties play a 
special role in the network structure. Then we propose 
a new model ID{a,f3), which associates the strength of 
ties with the diffusion, to simulate how the information 
spreads in online social networks. Contrary to our ex- 



pectation, selecting weak ties preferentially to republish 
cannot facilitate the information diffusion in the network, 
while the random selection can. Through extra analysis 
and experiments, we find that when a = — 1, the nodes 
with lower degrees are preferentially selected for repub- 
lishing, which will limit the scope of the distribution of re- 
publishing nodes in the following rounds. However, even 
for the random selection case, removal of the weak tie can 
make the coverage of the information decreases sharply, 
which is consistent with its special role in the structure. 



So we conclude that weak ties play a subtle role in the 
information diffusion in online social networks. On one 
hand, they play a role of bridges, which connect isolated 
communities and break through the trapping of informa- 
tion in local areas. On the other hand, selecting weak ties 
preferentially to republish cannot make the information 
diffuse faster in the network. For potential applications, 
we think that the weak ties might be of use in the control 
of the virus spread and the private information diffusion. 
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As a social media, online social networks play a vital role in the social information diffusion. 
However, due to its unique complexity, the mechanism of the diffusion in online social networks is 
different from the ones in other types of networks and remains unclear to us. Meanwhile, few works 
have been done to reveal the coupled dynamics of both the structure and the diffusion of online 
social networks. To this end, in this paper, we propose a model to investigate how the structure is 
coupled with the diffusion in online social networks from the view of weak ties. Through numerical 
experiments on large-scale online social networks, we find that in contrast to some previous research 
results, selecting weak ties preferentially to republish cannot make the information diffuse quickly, 
while random selection can achieve this goal. However, when we remove the weak ties gradually, the 
coverage of the information will drop sharply even in the case of random selection. We also give a 
reasonable explanation for this by extra analysis and experiments. Finally, we conclude that weak 
ties play a subtle role in the information diffusion in online social networks. On one hand, they act 
as bridges to connect isolated local communities together and break through the local trapping of 
the information. On the other hand, selecting them as preferential paths to republish cannot help 
the information spread further in the network. As a result, weak ties might be of use in the control 
of the virus spread and the private information diffusion in real-world applications. 



PACS numbers: 89.65.-s, 87.23. Ge, 89.70.-a, 89.75.-k 



I. INTRODUCTION 

The emergence of the Internet has changed the way 
of communication radically and, especially, the devel- 
opment of Web 2.0 applications has led to some ex- 
tremely popular online social sites, such as Facebook 1|, 
Flickr a, YouTube ^, Twitter [ij, Live Journal [g, 
Orkut [y] and Xiaonei [3| . These sites provide a powerful 
means of sharing information, finding content and orga- 
nizing contacts [8,] for ordinary people. Users can consoli- 
date their existing relationships in the real world through 
publishing blogs, photos, messages and even states. They 
also have a chance to communicate with strangers that 
they have never met on the other end of the world. Based 
on the development and prevalence of the Internet, online 
social sites have reformed the structure of the traditional 
social network to a new complex system, called the online 
social network, which attracts a lot of research interests 
recently as a new social media. 

Recent works about online social networks mainly fo- 
cus on probing and collecting network topologi es m, Q , 
structural analysis [8l-[Tl|. user interactions [l2| - [T3 | and 



content generating patterns [T^[TB|- At the same time, 
some concepts and methods of traditional social networks 
have also been introduced into current researches: The 
strength of ties is one of them. The strength of ties was 
first proposed by Granovetter in his landmark paper [l^ 
in 1973, in which he thought the strength of ties could 
be measured by the relative overlap of the neighborhood 
of two nodes in the network. It was interesting that dif- 
ferent from the common sense, he found that loose ac- 
quaintances, known as weak ties, were helpful in finding 
a new job jl7| . This novel finding has become a hot topic 
of research for decades. In ^ISf], a predictive model was 
proposed to map social media data to the tie strength. In 
|19| , Onnela et al. gave a simple but quantified definition 
to the overlap of neighbors of nodes i and j as follows: 



(1) 



ki 1 ^ kj 1 
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where Cy is the number of common acquaintances, ki and 
kj are the degrees of i and j, respectively. In this paper, 
we define Wij as the strength of the tie between i 
and j. The lower Wij is, the weaker the strength of tie 
between i and j is. 

As a social media, the core feature of online social 
networks is the information diffusion. However, the 
mechanism of the diffusion is different from traditional 
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models, such as Susceptible-Infected-Susceptible (SIS), 
Susceptible-Infected-Recovered (SIR) [13, and ran- 
dom walk p2l - [23 | . At the same time, few works have been 
done to reveal the coupled dynamics of both the struc- 
ture and the diffusion of online social networks [1^ [26| . 
To meet this critical challenge, in this paper, we aim to 
investigate the role of weak ties in the information diffu- 
sion in online social networks. 
By monitoring the dynamics of 



S 



E 



s<s„ 



N 



(2) 



where n is the number of connected clusters with S nodes, 
and N is the size of the network, a phase transition was 
found in the mobile communication network during the 
removal of weak ties first [l^. We find that this phase 
transition is pervasive in online social networks, which 
implies that weak ties play a special role in the struc- 
ture of the network. This interesting finding inspires us 
to investigate the role of weak ties in the information 
diffusion. To this end, we propose a model ID{a,/3) to 
characterize the mechanism of the information diffusion 
in online social networks and associate the strength of 
ties with the process of spread. Through the simula- 
tions on large-scale real-world data sets, we find that se- 
lecting weak ties preferentially to republish cannot make 
the information diffuse quickly, while the random selec- 
tion can. Nevertheless, further analysis and experiments 
show that the coverage of the information will drop sub- 
stantially during the removal of weak ties even for the 
random diffusion case. So we conclude that weak ties 
play a subtle role in the information diffusion in online 
social networks. We also discuss their potential use for 
the information diffusion control practices. 

The rest of this paper is organized as follows. Sec- 
tion [n] introduces the data sets used in this paper. In 
Section lllli we study the structural role of weak ties. 
The model ID{a,(3) is proposed in Section HVl and the 
role of weak ties in the information diffusion is then in- 
vestigated. Section |V] discusses the possible uses of weak 
ties in the control of the virus spread and the private in- 
formation diffusion. Finally, we give a brief summary in 
Section IVll 



II. DATA SETS 



TABLE I: Data Sets 


Data set 


1^1 


\E\ 


YouTube 


1134890 


2987624 


Facebook 


63392 


816886 



a list of all the user-to-user links crawled from the New 
Orleans regional network in Facebook during December 
29th, 2008 and January 3rd, 2009 In both two data 
sets, we treat the links as undirected. 

In these data sets, each node represents a user, while 
a tie between two nodes means there is a friendship be- 
tween two users. In general, creating a friendship be- 
tween two users always needs mutual permission. So 
we can formalize each data set as an undirected graph 
G{V,E), where V is the set of nodes and E is the 
set of ties. We use \V\ to denote the size of the 
network, and \E\ to denote the size of ties. Some 
characteristics of the data sets are shown in Table ID 
The Cumulative Distribution Function{C D F) of the 
strength of ties is shown in Fig. [TJ 
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FIG. 1: (Color online) CDF of the strength of ties. 

As we know, online social networks are divided into 
two types: knowledge-sharing oriented and networking 
oriented For the data sets we use, YouTube belongs 
to the former, while Facebook belongs to the latter, both 
of which are scale-free networks. 



We use two data sets in this paper, i.e., YouTube and 
Facebook in New Orleans. YouTube is a famous video 
sharing site, and Facebook is the most popular online 
social site which allows users to create friendships with 
other users, publish blogs, upload photos, send messages, 
and update their current states on their profile pages. 
All these sites have some privacy control schemes which 
control the access to the shared contents. The data 
set of YouTube includes user-to-user links crawled from 
YouTube in 2007 [8]. The data set of Facebook contains 



III. STRUCTURAL ROLE OF WEAK TIES 

In this section, we study the structural role of weak 
ties. As shown in Fig. [2a| and Fig. [2cl we find a phase 
transition (characterized by S) similar to the one in [l^ 
in online social networks during the removal of weak ties 
first. This phase transition, however, disappears if we re- 
move the strong ties first. Furthermore, it is also found 
in Fig. [2b] and Fig. [2d] that the relative size of giant con- 
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FIG. 2: (Color online) The variations of S and face during the removal of weak ties first and strong ties first, 

respectively, fr is the fraction of removed ties. 



nected cluster (GCC), denoted by face, shows differ- 
ent dynamics between the removals of weak ties first and 
strong ties first. We denote the critical fractions of the re- 
moved ties at the phase transition point by fc- It is inter- 
esting to note that fc = 0.753 for YouTube and fc — 0.890 
for Facebook when S reaches the submit, which are very 
close to the case when face ~ 0. 

In the percolation theory, the existence of the above 
phase transition means that the network is collapsed, 
while the network is just shrinking if there is no phase 
transition when removing the ties [19| . So the above ex- 
periments tell us that weak ties play a special role in 
the structure of online social networks, which is different 
from the one strong ties play. In fact, they act as the 
important bridges that connect isolated communities. In 
what follows, we build a model that associates the weak 
ties with the information diffusion, to discuss the coupled 
dynamics of the structure and the information diffusion. 



IV. DIFFUSING ROLE OF WEAK TIES 

The information diffusing in online social networks in- 
cludes blogs, photos, messages, comments, multimedia 
files, states, etc. Because of the privacy control and other 
features of online social sites, the mechanism of the in- 



formation diffusion in online social networks is different 
from traditional models, such as SIS, SIR and random 
walk. We start by discussing the procedure of informa- 
tion diffusion in online social networks. 



A. The Procedure of Information Diffusion 

The procedure of the diffusion in online social networks 
can be briefiy described as follows: 

• The user i publishes the information /, which may 
be a photo, a blog, etc. 

• Friends of i will know / when they access the profile 
page of i or get some direct notifications from the 
online social site. We call this scheme as push. 

• Some friends of i, may be one, many or none, will 
comment, cite or reprint /, because they think that 
it is interesting, funny or important. We call this 
behavior as republish. 

• The above steps will be repeated with i replaced by 
each of those who have republished /. 

It is easy to find that the key feature of the informa- 
tion diffusion in online social networks is that the infor- 
mation is pushed actively by the site and only part of 
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friends will republish it. Take Facebook as an example, 
in which News Feed and Live Feed are two significant 
and popular features. News Feed constantly updates a 
user's profile page to list all his or her friends' news in 
Facebook. The news includes conversations taking place 
between the walls of the user's friends, changes of pro- 
file pages, events, and so on ^2l\. Live Feed facilitates 
the users to access the details of the contents updated by 
News Feed. It is updated in a real-time manner after the 
user's login to the web [1^. In fact, News Feed aggre- 
gates the most interesting contents that a user's friends 
are posting, while Live Feed shows to the user all the 
actions his or her friends are taking in Facebook j29| . 

The feature of pushing and republishing we have dis- 
cussed above is indeed more obvious in Twitter, in which 
all the words you post will be pushed immediately to 
your followers' terminals, including a PC or even a mo- 
bile phone, and then they can republish it if they like. 
However, in real-world situations, the trace of the infor- 
mation is hard to collect [HI, especially for large-scale 
networks. So it is quite reasonable to build a model to 
characterize the mechanism and simulate the diffusion. 



If j is not in P, then add it to the set of nodes that 
will republish / in the next round, denoted by W. 
So W = W Li {j}. Repeat this step for Ri times. 

• Step 6: For each node in W, execute from Step 3 to 
Step 5 recursively until W is null or all the nodes 
in V have known /. 

It is easy to find from Eq. ([3]) that during the diffu- 
sion, the number of republishing nodes selected from the 
neighborhood of i is decided by ki and j3. It is consistent 
with the real situation that the user with more friends 
tends to attract more other users to visit and republish 
the information. The more interesting or important the 
information is, the higher the chance that it will be re- 
published. We use parameter a in Eq. (U) to associate the 
diffusion with the strength of the ties, which means dif- 
ferent values of a will lead to different selections of ties as 
paths for republishing information in the next round. In 
fact, when a = — 1, weak ties are to be selected preferen- 
tially as paths for republishing. The selection is random 
when a = 0, and the strong ties will be selected with 
higher priority when a — 1. 



B. The Model for Information Diffusion 



C. Results and Analysis 



Based on the procedure described above, we propose a 
simple model ID(a,l3), where a is the navigating factor 
and /3 represents the strength of the information. In this 
model, a determines how to select neighbors to republish 
the information, while /3 G [0, 1] is a physical character of 
the information, which describes how interesting, novel, 
important, funny or resounding it is. The model is de- 
fined as follows: 

• Step 1: Suppose there comes information I. Set 
the state of all the nodes in V to (Jq. The state (Tq 
of a node means I is not known to it, otherwise the 
state is ui. 

• Step 2: Randomly select a seed node i from the 
network. The degree of i is ki. Set i to cti. It 
publishes the information / with strength equal to 
/3 at time T = 0. 

• Step 3: Increase the time by one unit, i.e., T = 
T + 1. Set each node in the neighborhood of i to 
CTi. Add i to the set of nodes that have published 
/, denoted by P. So P = P U {i}. 

• Step 4: Calculate the number of nodes that will 
republish / in the next round: 



(3) 



Step 5: Select one node j from the neighborhood 
of i with the probability |30[ 



y 



(4) 



We define the fraction of nodes with the state (Ti as the 
coverage of /, denoted by C. Since it is found that only 
1-2% friends will republish the information in Flickr [25| , 
we let /3 = 0.01 in the simulations. Fig. [3] shows the 
numeric experimental results on Facebook and YouTube 
networks. As can be seen, C reaches the maximum when 
a = 0. In other words, compared with weak or strong 
ties, selecting the republishing nodes randomly from the 
neighborhood will make the information spread faster 
and wider. This is indeed out of our expectation, since 
previous studies show that weak ties can facilitate the 
information diffusion in social networks. 

To understand this, we further explore the process of 
the information diffusion in details. By Eq. ([T}, we can 
easily have 



= {k, - 2)/cij + kj/ci 



1. 



Assume that as kj increases, Cij increases proportion- 
ately, i.e., kj/cij — const. Then given a node i and its 
neighbor node j, we have kj Cij t=^ l/w'y f, 
and vice versa. This implies that a neighbor node of i 
tends to have a higher degree if it has a stronger strength 
of ties with i. Therefore, when selecting the republishing 
nodes for the next round from the neighborhood, different 
a will select nodes with different degrees preferentially. 
For example, when a — ~1, the weak ties will be se- 
lected with higher priority, which means that the nodes 
with lower degrees will be selected preferentially. How- 
ever, it is easy to learn from Eq. ^ that, for the node 
with lower degree, the republishing nodes selected from 
its neighborhood will be less, which will eventually reduce 
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FIG. 3: (Color online) The dynamics of C during the process of the diffusion. We perform the experiments for each 
pair of a and /3 20 times and return the mean value as the final result. 



the total number of republishing nodes and impede the 
information from further spreading in the network. As to 
the case of selecting strong ties preferentially, although 
it will tend to select the nodes with higher degrees to 
republish, the local trapping [l^ will limit the scope of 
selected nodes into some local areas and make it harder 
to propagate the information further in the network. 

To validate the analysis above, we also observe the frac- 
tion of the nodes that have published / during the diffu- 
sion, denoted by fpub- As shown in Fig. |31 fpub increases 
more slowly when a = — 1 , and the time- varying proper- 
ties of fpub are similar to those of C in Fig.[3]for different 
a values, respectively. We also monitor the fraction of 
the nodes that have published / in each hop away from 
the source node, denoted by fiocai- As shown in Fig. [SI 
when a = — 1, fiocai decreases faster than other cases, in 
particular the a ~ case. It means when a = —1, the 
number of republishing nodes selected from the neigh- 
borhood decreases sharply as the information spreading 
far away from the source, which agrees with our former 
analysis. As for the case of a = 1, fpub increases more 
and more slowly during the diffusion, because the nodes 
selected to republish are trapped in some local clusters. 
In other words, it is hard to find some new nodes to re- 
publish the information to the outer space. 

Based on the above results, we can conclude that se- 
lecting weak ties preferentially as the path to republish 
information cannot make it diffuse faster. However, this 
does not mean that weak ties play a trivial role in the 
information diffusion in online social networks, especially 
when we recall its special role in the network structure 
in Section Hill Let a = in ID{a,/3), we compare the 
variation of C under the situation of removing weak ties 
first with that of removing strong ties first. As shown in 
Fig. ini for the case of removing weak ties first, the cover- 
age of the information decreases rapidly, e.g., from 0.8 to 
0.4 in Facebook when the fraction of removed weak ties 
reaches about 0.4. This implies that weak ties are indeed 
crucial for the coverage of information diffusion in online 
social networks. 



To further study the effect of f3, we conduct experi- 
ments with different /3 values, as shown in Fig. [T] As can 
be seen, no matter what the /? value is, random selection 
(a = 0) is still the fastest mode for the information dif- 
fusion, although the gap tends to shrink with higher /3 
values. It is also shown that when /3 grows, C will also 
rise for all a values. That is, the greater the strength of 
the information is, the more nodes will be attracted to 
republish it, and the wider it will spread in the network. 

Until now we can conclude that weak ties play a subtle 
role in the information diffusion in online social networks. 
On one hand, they are bridges that connect isolated com- 
munities and break through the trapping of information 
in local areas p^. On the other hand, selecting weak ties 
preferentially as the path of republishing cannot make the 
information diffuse faster and wider. 



V. DIFFUSION CONTROL 

The growing popularity of the online social networks 
does not mean that it is safe and reliable. On the con- 
trary, the virus spread and the private information diffu- 
sion have made it become a massive headache for IT ad- 
ministrators and users [3l|, [s^l ■ For example, "KooFace" 
is a Trojan Worm on Facebook, which spreads by leaving 
a comment on profile pages of the victim's friends to trap 
a click on the malicious link [SS^. About 63% of system 
administrators worry that their employees will share too 
much private information online [34| . So as time goes 
by, it becomes more and more important and urgent to 
control the virus spread and the private information dif- 
fusion in online social networks. 

In the light of this, we can make use of the weak ties 
for the information diffusion control. That is, in the real- 
world practices, we can assume that the behavior of re- 
publishing information is random, i.e., a = 0. Then ac- 
cording to the results in Fig. [51 we can make the virus or 
the private information trapped in local communities by 
removing weak ties and stop them from diffusing further 
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FIG. 4: (Color online) The dynamics of fpub during the process of the diffusion. We perform the experiments for 
each pair of a and /3 20 times and return the mean value as the final result. 
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FIG. 5: (Color online) The dynamics of fiocai during the information propagation far away from the source. We 
perform each experiment 20 times and get the mean value as the final result. 



in the network. 



VI. SUMMARY 

Online social sites have become one of the most popu- 
lar Web 2.0 applications in the Internet. As a new social 
media, the core feature of online social networks is the in- 
formation diffusion. We investigate the coupled dynamics 
of the structure and the information diffusion in the view 



of weak tics. Different from the recent work [25|, we do 
not focus on the trace collection and analysis of the real 
data flowing in the network. Instead, inspired by [l^ . 
we propose a model for online social networks and take 
a closer look at the role of weak ties in the diffusion. 

We find that the phase transition found in the mo- 
bile communication network exists pervasively in online 
social networks, which means that the weak ties play a 
special role in the network structure. Then we propose 
a new model ID{a,f3), which associates the strength of 
ties with the diffusion, to simulate how the information 
spreads in online social networks. Contrary to our ex- 



pectation, selecting weak ties preferentially to republish 
cannot facilitate the information diffusion in the network, 
while the random selection can. Through extra analysis 
and experiments, we find that when a = — 1, the nodes 
with lower degrees are preferentially selected for repub- 
lishing, which will limit the scope of the distribution of re- 
publishing nodes in the following rounds. However, even 
for the random selection case, removal of the weak tie can 
make the coverage of the information decreases sharply, 
which is consistent with its special role in the structure. 



So we conclude that weak ties play a subtle role in the 
information diffusion in online social networks. On one 
hand, they play a role of bridges, which connect isolated 
communities and break through the trapping of informa- 
tion in local areas. On the other hand, selecting weak ties 
preferentially to republish cannot make the information 
diffuse faster in the network. For potential applications, 
we think that the weak ties might be of use in the control 
of the virus spread and the private information diffusion. 
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final result. 
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